A phylotranscriptomic dataset of angiosperm species under cold stress

Angiosperms are one of the most diverse and abundant plant groups that are widely distributed on Earth, from tropical to temperate and polar zones. The wide distribution of angiosperms may be attributed to the evolution of sophisticated mechanisms of environmental adaptability, including cold tolerance. Since the development of high-throughput sequencing, transcriptome has been widely utilized to gain insights into the molecular mechanisms of plants in response to cold stress. However, previous studies generally focused on single or two species, and comparative transcriptome analyses for multispecies responding to cold stress were limited. In this study, we selected 11 representative angiosperm species, performed phylotranscriptome experiments at four time points before and after cold stress, and presented a profile of cold-induced transcriptome changes in angiosperms. Our multispecies cold-responsive RNA-seq datasets provide valuable references for exploring conserved and evolutionary mechanisms of angiosperms in adaptation to cold stress.


Eudicots Monocots
A total of 11 angiosperms For each of the species  (4 ) for 2 h Ambient temperature (25 ) Ambient temperature (25 ) Ambient temperature (25 )  www.nature.com/scientificdata www.nature.com/scientificdata/ Methods plant materials and growth conditions. Seedlings of 11 selected representative angiosperm species, including six eudicots (A. thaliana, B. pendula, P. trichocarpa, C. illinoinensis, G. max, and C. sativus) and five monocots (O. sativa, S. italica, H. vulgare, Z. mays, and P. edulis) (Fig. 1a), were cultured in an artificial climate chamber with 25 °C at a photoperiod of 16/8 h light/dark cycle. For A. thaliana, three-week-old seedlings were prepared for utilization in cold treatment. For other species, young seedlings growing up to ~30 cm in height were prepared. cold stress treatment. Under cold stress, gene expression is reprogrammed to form a hierarchical regulatory network, which is constituted of rapid, early, and late cold-responsive genes. To obtain these genes, we performed cold stress treatments of the 11 selected representative angiosperm species under different time points (0, 2, 24, and 168 h). For each species, seedlings with relatively uniform growth and physiological state were selected and divided into four groups (Group 1 to Group 4). To ensure that the four seedling groups of cold treatments (0, 2, 24, and 168 h) could be harvested in the same development stage at the same time on a day, the seedling group of cold treatment for 168 h was first cultured in the artificial climate chamber with 4 °C a week (168 h) before harvest, and then were seedling groups of cold treatment, respectively, for 24, 2, and 0 h at proper times. After cold stress treatments, we collected the fourth expanded leaves of the treated seedlings, which are generally considered as mature healthy leaves at similar developmental stages. For each cold treatment, three biological replicates were performed (Fig. 1b).
RNA extraction, library construction, and sequencing. The total RNA of the collected leaves from each species was isolated and purified using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer's procedure. The RNA amount and purity of each sample were assessed by NanoDrop ND-1000 (NanoDrop Technologies, Wilmington, DE, USA) and Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, California, USA). Poly (A) RNA was purified from total RNA using poly-T oligo-attached magnetic beads to generate strand-specific cDNA libraries containing inserts of approximately 150-200 bp in size. In total, 132 cDNA libraries from 11 species under four time points before and after cold stress (0, 2, 24, and 168 h) were constructed for transcriptome analysis. The libraries were sequenced on Illumina NovaSeq 6000 sequencing system (2 × 150 bp paired-end reads) at LC-Bio Technology CO., Ltd. (Hangzhou, China) according to the manufacturer's instructions (Fig. 1c).

Data Records
The above 132 RNA-seq datasets have been deposited into the NCBI BioProject with the accession number PRJNA767196 20 . The read-count data matrix of cold-treated samples in each of the 11 species is available at figshare data repository (https://doi.org/10.6084/m9.figshare.22643245.v1) 21 . DEGs between different time points of cold treatments and normal conditions for each species time points are also available at figshare (https://doi.org/10.6084/m9.figshare.22643074.v1) 22 .

technical Validation
Data filtering and quality control. The raw data in fastq format were processed by Trimmomatic v0.39 23 to remove the Illumina adapter contamination and low-quality bases. After filtering, quality control of the clean reads in each RNA-seq dataset was assessed using FastQC 24   www.nature.com/scientificdata www.nature.com/scientificdata/ The average GC content of the dataset was 47.84% (Dataset 1). Overall results of filtering and quality of clean reads indicated that the sequencing progressed adequately composing a series of high-quality RNA-seq datasets.  30 to detect the quality of the reference genomes. The complete BUSCO values of the 11 reference genomes ranged from 86.3% to 99.3%, among which 10 genomes were over 90% (Fig. 2a), indicating that the reference genomes were appropriate and of high quality. To detect the mapping ratio of the transcriptomes, clean reads from each sample were mapped to their corresponding reference genome by HISAT2 v2.   (Fig. 3), suggesting a high consistency of the biological replicates.    www.nature.com/scientificdata www.nature.com/scientificdata/ Identification of differentially expressed genes. Differentially expressed genes (DEGs) between two time points before and after cold treatments (0 h versus 2, 24, or 168 h) were obtained using edgeR 34 , DESeq 2 35 , and Ballgown 36 . In brief, analyzed from at least two of the three methods, those genes with a mean TMM ≥ 1 across the compared samples that had an adjusted P-value or false discovery rate (FDR) <0.05 and an absolute value of fold change ≥ 2 were considered to be DEGs. For each species, the number of DEGs under different time points of cold treatments (2, 24, and 168 h) compared with the control (0 h) was calculated (Fig. 4a,b, Dataset 4). The meta-data of DEGs are available under Figshare DEG tables of 11 angiosperm species under different time points of cold treatments 22 . Venn diagrams of the DEGs obtained from different time points of cold treatments were analyzed (Fig. 4c). Additionally, we used R programming to plot heatmaps of the expression of DEGs in each species (Fig. 4d, Dataset 5).

Code availability
Software and their versions used for RNA-seq analysis were described in Methods. No custom code was used to generate or process the data described in the manuscript.